The MULI Project: Annotation and Analysis of Information Structure in German and English
نویسندگان
چکیده
The goal of the MULI (MUltiLingual Information structure) project is to empirically analyse information structure in German and English newspaper texts. In contrast to other projects in which information structure is annotated and investigated (e.g. in the Prague Dependency Treebank, which mirrors the basic information about the topic-focus articulation of the sentence), we do not annotate theory-biased categories like topic-focus or theme-rheme. Trying to be as theory-independent as possible, we annotate those features which are relevant to information structure and on the basis of which typical patterns, co-occurrences or correlations can be determined. We distinguish between three annotation levels: syntax, discourse and prosody. The data is based on the TIGER Corpus for German and the Penn Treebank for English, since the existing information on part-of-speech and syntactic structure can be re-used for our purposes. The actual annotation of an English example sequence illustrates our choice of categories on each level. Their combination offers the possibility to investigate how information structure is realised and can be interpreted.
منابع مشابه
A Systematic Evaluation of Concept-based Cross-Lingual Information Retrieval in the Medical Domain
The paper describes experiments and results of the MuchMore project1, which is concerned with a systematic comparison of concept-based and corpus-based methods in cross-language information retrieval (CLIR) in the medical domain. Primary goals of the project are to develop and evaluate methods for the effective use of multilingual thesauri in the semantic annotation of English and German medica...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملDiscourse-Level Annotation For Investigating Information Structure
We present discourse-level annotation of newspaper texts in German and English, as part of an ongoing project aimed at investigating information structure from a cross-linguistic perspective. Rather than annotating some specific notion of information structure, we propose a theory-neutral annotation of basic features at the levels of syntax, prosody and discourse, using treebank data as a start...
متن کاملPERFORMANCE-BASED SEISMIC DESIGN OPTIMIZATION FOR MULI-COLUMN RC BRIDGE PIERS, CONSIDERING QUASI-ISOLATION
In this paper an optimization framework is presented for automated performance-based seismic design of bridges consisting of multi-column RC pier substructures. The beneficial effects of fusing components on seismic performance of the quasi-isolated system is duly addressed in analysis and design. The proposed method is based on a two-step structural analysis consisting of a linear modal dynami...
متن کاملProjecting Temporal Annotations Across Languages
This thesis investigates the use of parallel corpora for the annotation of temporal objects and relations. In particular, we employ existing tools for the temporal analysis of English to annotate the English portion of an English-German bitext, and automatically project these annotations to the German text, guided by word alignment. Projection-based approaches to multilingual annotation have pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004